Skip to content

Fix hub notes race condition: await refresh inline#8

Merged
drewburchfield merged 3 commits intomasterfrom
fix/hub-notes-stale-counts
Apr 3, 2026
Merged

Fix hub notes race condition: await refresh inline#8
drewburchfield merged 3 commits intomasterfrom
fix/hub-notes-stale-counts

Conversation

@drewburchfield
Copy link
Copy Markdown
Owner

@drewburchfield drewburchfield commented Apr 3, 2026

Closes NAS-990

Summary

  • Changed _ensure_fresh_counts from fire-and-forget (asyncio.create_task) to inline await
  • Moved staleness check inside _refresh_lock to prevent TOCTOU race and duplicate refreshes
  • Extracted _do_refresh() (no lock) from _refresh_all_counts() (with lock)
  • Removed _handle_refresh_error callback (errors propagate through await)
  • Updated CONCURRENCY.md to reflect wait-not-skip behavior
  • Updated race condition test to verify exactly 1 refresh with 20 concurrent callers
  • Updated hub analyzer tests to match new atomic check+refresh pattern

Root cause

_ensure_fresh_counts() fired a background refresh and returned immediately. The query ran before the refresh completed, so first call to get_hub_notes always returned empty results.

Test plan

  • All 8 hub-related tests pass in Docker
  • Race condition test verifies exactly 1 refresh with 20 callers
  • Docker image builds and server initializes
  • Quality gate review: TOCTOU race fixed, docs and tests updated
  • Braintrust consultation (Claude + Gemini): unanimous on inline await approach

Open with Devin

_ensure_fresh_counts() was using asyncio.create_task() to refresh
connection counts in the background, causing get_hubs/get_orphans to
query stale data on the first call. Now awaits the refresh inline so
counts are always fresh before the query runs.

Changes:
- Await _refresh_all_counts() directly instead of create_task()
- Wait for in-progress refresh instead of skipping when lock is held
- Remove _handle_refresh_error callback (errors propagate via await)
- Update tests to match new wait-instead-of-skip behavior
- Move staleness check inside _refresh_lock to prevent duplicate refreshes
- Extract _do_refresh (no lock) from _refresh_all_counts (with lock)
- Update CONCURRENCY.md to reflect wait-not-skip behavior
- Rewrite race condition test to verify exactly 1 refresh with 20 callers
- Fix hub_analyzer tests to match new atomic check+refresh pattern
devin-ai-integration[bot]

This comment was marked as resolved.

- Fix test_staleness_check_skips_refresh_when_fresh to mock _do_refresh
  instead of _refresh_all_counts (vacuously true assertion)
- Release pool connection before calling _do_refresh to avoid holding
  an idle connection during the entire refresh operation
@drewburchfield drewburchfield merged commit 3db1f73 into master Apr 3, 2026
4 checks passed
@drewburchfield drewburchfield deleted the fix/hub-notes-stale-counts branch April 3, 2026 18:06
drewburchfield added a commit that referenced this pull request Apr 3, 2026
…me migration

- Revert connection_count back to = 0 on upsert (preserving counts breaks
  staleness detection which checks WHERE connection_count = 0; PR #8's
  inline-await already ensures fresh counts before queries)
- Add DROP TRIGGER/FUNCTION to initialize() so existing databases get the
  migration on server start, not just from schema.sql
drewburchfield added a commit that referenced this pull request Apr 3, 2026
…ons (#10)

* Fix code correctness issues (NAS-995)

- Remove schema trigger that overwrites modified_at with CURRENT_TIMESTAMP,
  preserving the real file mtime passed by the indexer
- Change hub_analyzer error type from ValueError to DatabaseError for
  consistency with the exception hierarchy
- Fix validation docstrings: document that functions raise ValidationError
  instead of claiming they clamp values
- Remove dead empty-content filter in indexer.py (already filtered at
  lines 119-122)
- Remove unused _refresh_all_counts method from hub_analyzer and update
  test to call _do_refresh directly
- Preserve connection_count on upsert (use notes.connection_count instead
  of resetting to 0) to avoid unnecessary full refreshes
- Add ORDER BY chunk_index to graph_builder LIMIT 1 query for deterministic
  chunk selection
- Fix weak test assertion: "pool" or "timeout" always evaluates True,
  changed to check membership in actual response text

* Add migration: drop trigger for existing databases

Existing databases retain the mtime-overwriting trigger even after
the schema file is updated. Add DROP TRIGGER/FUNCTION IF EXISTS
so the init script cleans up on restart. Also fix stale comment.

* Address Devin review: revert connection_count preservation, add runtime migration

- Revert connection_count back to = 0 on upsert (preserving counts breaks
  staleness detection which checks WHERE connection_count = 0; PR #8's
  inline-await already ensures fresh counts before queries)
- Add DROP TRIGGER/FUNCTION to initialize() so existing databases get the
  migration on server start, not just from schema.sql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant